自动医疗问题摘要可以极大地帮助系统了解消费者健康问题并检索正确的答案。基于最大似然估计(MLE)的SEQ2SEQ模型已在此任务中应用,这面临两个一般问题:该模型无法捕获良好的问题,并且传统的MLE策略缺乏理解句子级语义的能力。为了减轻这些问题,我们提出了一个新颖的问题焦点驱动的对比学习框架(QFCL)。特别是,我们提出了一种简单有效的方法来基于问题的重点生成硬性样本,并利用编码器和解码器的对比度学习以获得更好的句子级别表示。在三个医疗基准数据集上,我们提出的模型可实现新的最新结果,并在三个数据集的基线BART模型上获得了5.33、12.85和3.81点的性能增益。进一步的人类判断和详细的分析证明,我们的QFCL模型可以学习更好的句子表示,具有区分不同句子含义的能力,并通过捕获问题重点来产生高质量的摘要。
translated by 谷歌翻译
视频框架插值是一项艰巨的任务,这是由于不断变化的现实场景。先前的方法通常计算双向光流,然后在线性运动假设下预测中间光流,从而导致各向同性中间流量产生。随访研究通过估计的高阶运动信息和额外的帧获得各向异性调整。基于运动假设,它们的方法很难在真实场景中对复杂的运动进行建模。在本文中,我们提出了一种端到端训练方法A^2OF,用于视频框架插值,并通过事件驱动的各向异性调整光学流量调节。具体而言,我们使用事件为中间光流生成光流分布掩码,这可以对两个帧之间的复杂运动进行建模。我们提出的方法在视频框架插值中优于先前的方法,将基于事件的视频插值带到了更高的阶段。
translated by 谷歌翻译
图形神经网络(GNN)在解决图形结构数据(即网络)方面的各种分析任务方面已广受欢迎。典型的gnns及其变体遵循一种消息的方式,该方式通过网络拓扑沿网络拓扑的特征传播过程获得网络表示,然而,它们忽略了许多现实世界网络中存在的丰富文本语义(例如,局部单词序列)。现有的文本丰富网络方法通过主要利用内部信息(例如主题或短语/单词)来整合文本语义,这些信息通常无法全面地挖掘文本语义,从而限制了网络结构和文本语义之间的相互指导。为了解决这些问题,我们提出了一个具有外部知识(TEKO)的新型文本富裕的图形神经网络,以充分利用文本丰富的网络中的结构和文本信息。具体而言,我们首先提出一个灵活的异质语义网络,该网络结合了文档和实体之间的高质量实体和互动。然后,我们介绍两种类型的外部知识,即结构化的三胞胎和非结构化实体描述,以更深入地了解文本语义。我们进一步为构建的异质语义网络设计了互惠卷积机制,使网络结构和文本语义能够相互协作并学习高级网络表示。在四个公共文本丰富的网络以及一个大规模的电子商务搜索数据集上进行了广泛的实验结果,这说明了Teko优于最先进的基线。
translated by 谷歌翻译
成功地应用生成的对抗性网络(GaN)以研究感知单个图像超级度(SISR)。然而,GaN经常倾向于产生具有高频率细节的图像与真实的细节不一致。灵感来自传统细节增强算法,我们提出了一种新的先前知识,先前的细节,帮助GaN减轻这个问题并恢复更现实的细节。所提出的方法名为DSRAN,包括良好设计的详细提取算法,用于捕获图像中最重要的高频信息。然后,两种鉴别器分别用于在图像域和细节域修复上进行监督。 DSRGAN通过细节增强方式将恢复的细节合并到最终输出中。 DSRGAN的特殊设计从基于模型的常规算法和数据驱动的深度学习网络中获得了优势。实验结果表明,DSRGAN在感知度量上表现出最先进的SISR方法,并同时达到保真度量的可比结果。在DSRGAN之后,将其他传统的图像处理算法结合到深度学习网络中,以形成基于模型的深SISR。
translated by 谷歌翻译
半监控视频对象分段(VOS)旨在在视频序列中分段一些移动对象,其中通过注释第一帧来指定这些对象。已经考虑了许多现有的半监督VOS方法以提高分割精度的光学流程。然而,由于光学流量估计的高复杂性,光流基的半监控VOS方法不能实时运行。在该研究中提出了由特征提取网络(F),外观网络(A),运动网络(A)和集成网络(I)组成的FAMINET,以解决上述问题。外观网络基于对象的静态外观输出初始分割结果。运动网络通过很少的参数估计光学流量,这些参数通过在线记忆算法快速优化,该算法被称为松弛最陡血迹。集成网络使用光流来改进初始分割结果。广泛的实验表明,FAMINET在DAVIS和YOUTUBE-VOS基准上表现出其他最先进的半监督VOS方法,并且它在准确性和效率之间实现了良好的权衡。我们的代码可在https://github.com/liuziyang123/faminet获得。
translated by 谷歌翻译
空间变化暴露(SVE)是高动态(HDR)成像(HDRI)的有希望的选择。被称为单射HDRI的SVE的HDRI是一种有效的解决方案,以避免重影文物。然而,恢复从真实世界的图像与SVE恢复全分辨率的HDR图像是非常具有挑战性的,因为:a)在拜耳图案中,通过相机捕获具有不同曝光的三分之一的像素,B)捕获的一些捕获像素过于和暴露。对于以前的挑战,设计了一种空间变化的卷积(SVC)来设计以改变曝光的携带携带的拜耳图像。对于后者,提出了一种曝光 - 引导方法,以防止来自暴露和暴露的像素的干扰。最后,联合去脱模和HDRI深度学习框架被形式化以包括两种新型组件,并实现端到端的单次HDRI。实验表明,所提出的端到端框架避免了累积误差问题并超越了相关的最先进的方法。
translated by 谷歌翻译
这项工作提出了一种新的计算框架,用于学习用于真实数据集的明确生成模型。特别地,我们建议在包含多个独立的多维线性子空间组成的特征空间中的多类多维数据分发和{线性判别表示(LDR)}之间学习{\ EM闭环转录}。特别地,我们认为寻求的最佳编码和解码映射可以被配制为编码器和解码器之间的{\ em二手最小游戏的均衡点}。该游戏的自然实用功能是所谓的{\ em速率减少},这是一个简单的信息定理措施,用于特征空间中子空间类似的高斯的混合物之间的距离。我们的配方利用来自控制系统的闭环误差反馈的灵感,避免昂贵的评估和最小化数据空间或特征空间的任意分布之间的近似距离。在很大程度上,这种新的制定统一了自动编码和GaN的概念和益处,并自然将它们扩展到学习多级和多维实际数据的判别和生成}表示的设置。我们对许多基准图像数据集的广泛实验表明了这种新的闭环配方的巨大潜力:在公平的比较下,学习的解码器的视觉质量和编码器的分类性能是竞争力的,并且通常比基于GaN,VAE或基于GaN,VAE或基于GaN,VAE的方法更好的方法两者的组合。我们注意到所以,不同类别的特征在特征空间中明确地映射到大约{em独立的主管子空间};每个类中的不同视觉属性由每个子空间中的{\ em独立主体组件}建模。
translated by 谷歌翻译
In deep learning, neural networks serve as noisy channels between input data and its representation. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts are concentrated on realizing optimal channel properties during network optimization, we study a frequently overlooked possibility that neural networks can be initialized toward optimal channels. Our theory, consistent with experimental validation, identifies primary mechanics underlying this unknown possibility and suggests intrinsic connections between statistical physics and deep learning. Unlike the conventional theories that characterize neural networks applying the classic mean-filed approximation, we offer analytic proof that this extensively applied simplification scheme is not valid in studying neural networks as information channels. To fill this gap, we develop a corrected mean-field framework applicable for characterizing the limiting behaviors of information propagation in neural networks without strong assumptions on inputs. Based on it, we propose an analytic theory to prove that mutual information maximization is realized between inputs and propagated signals when neural networks are initialized at dynamic isometry, a case where information transmits via norm-preserving mappings. These theoretical predictions are validated by experiments on real neural networks, suggesting the robustness of our theory against finite-size effects. Finally, we analyze our findings with information bottleneck theory to confirm the precise relations among dynamic isometry, mutual information maximization, and optimal channel properties in deep learning.
translated by 谷歌翻译
Batteries plays an essential role in modern energy ecosystem and are widely used in daily applications such as cell phones and electric vehicles. For many applications, the health status of batteries plays a critical role in the performance of the system by indicating efficient maintenance and on-time replacement. Directly modeling an individual battery using a computational models based on physical rules can be of low-efficiency, in terms of the difficulties in build such a model and the computational effort of tuning and running it especially on the edge. With the rapid development of sensor technology (to provide more insights into the system) and machine learning (to build capable yet fast model), it is now possible to directly build a data-riven model of the battery health status using the data collected from historical battery data (being possibly local and remote) to predict local battery health status in the future accurately. Nevertheless, most data-driven methods are trained based on the local battery data and lack the ability to extract common properties, such as generations and degradation, in the life span of other remote batteries. In this paper, we utilize a Gaussian process dynamical model (GPDM) to build a data-driven model of battery health status and propose a knowledge transfer method to extract common properties in the life span of all batteries to accurately predict the battery health status with and without features extracted from the local battery. For modern benchmark problems, the proposed method outperform the state-of-the-art methods with significant margins in terms of accuracy and is able to accuracy predict the regeneration process.
translated by 谷歌翻译
The spread of rumors along with breaking events seriously hinders the truth in the era of social media. Previous studies reveal that due to the lack of annotated resources, rumors presented in minority languages are hard to be detected. Furthermore, the unforeseen breaking events not involved in yesterday's news exacerbate the scarcity of data resources. In this work, we propose a novel zero-shot framework based on prompt learning to detect rumors falling in different domains or presented in different languages. More specifically, we firstly represent rumor circulated on social media as diverse propagation threads, then design a hierarchical prompt encoding mechanism to learn language-agnostic contextual representations for both prompts and rumor data. To further enhance domain adaptation, we model the domain-invariant structural features from the propagation threads, to incorporate structural position representations of influential community response. In addition, a new virtual response augmentation method is used to improve model training. Extensive experiments conducted on three real-world datasets demonstrate that our proposed model achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.
translated by 谷歌翻译